Geometry transformation can be done using specialized hardware. Obtaining the required rates of floating point computation to achieve a given level of transformation performance at a reasonable cost is challenging. Two techniques are typically used to increase the throughput of transformation hardware:
While pipelining and parallelism are discussed here in the context of geometry transformation (a particularly applicable stage in the general graphics pipeline), these two techniques can be used throughout the graphics pipeline. The example of per-fragment operations being performed by multiple image processors shown in Figure 4 is an example of parallelism. Additionally, the general graphics pipeline itself is amenable to pipelining (hence the name).
While pipelining is straightforward, it is not always as flexible as parallelism. The stages in a hardware pipeline tend to be hard-wired, meaning it is difficult to add new stages and, if the work required for any single stage grows out of proportion to the other stages, that stage can undermine the efficiency of the pipeline. For example, OpenGL supports up to eight light sources. If a single pipeline stage performs all lighting calculations, that stage might backup the pipeline. The problem could be solved by dividing the lighting stage into eight separate hardware stages. While feasible, most of the time, eight lights are not enabled so the extra lighting stages would go largely unused.
Parallelism may be more flexible than pipelining since each unit of work is executed largely independently of work executing on other parallel processors. So in the case of eight lights, each processor simply does more lighting calculations compared to the number required for a single light. Unlike a hardware pipeline, parallel processors can absorb the extra work without leaving other hardware underutilized.
Appreciate that pipelining and parallelism are not conflicting techniques, but can be complementary. For example, inside each parallel geometry processor, you are likely to find a floating point pipeline executing.
Another trade-off when designing geometry transformation hardware is whether to hardwire the hardware to do its specific task, or to design the hardware in a more general way so that it executes its tasks based on specialized software normally called micro-code. Generally, hardwired hardware can run faster, but is very inflexible, and when very complicated, can be harder to design. Hardwired hardware is more suitable for tasks that are well defined and very unlikely to change. Rasterization and display hardware is often hardwired. Usually hardware for geometry transformation is micro-coded.